Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
📊 LLM Evals
Specific
model evaluation, benchmarks, evals
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
6273
posts in
20.2
ms
Jankmarking
: Janky
Benchmarking
⚙️
Performance Profiling
williamangel.net
·
3d
·
Hacker News
Why does AI memory fail at
connecting
facts? I
ran
the benchmarks to find out
⚙️
Systems Programming
yourmemoryai.xyz
·
2d
·
Hacker News
,
r/SideProject
Mapping AI
benchmarks
onto a common
capability
scale
⌚
Quantified Self
aiiq.org
·
1h
·
Hacker News
Open Source Robot Policies,
Datasets
, and
Benchmarks
🤖
Game AI
festivus.hapticlabs.ai
·
39m
·
Hacker News
Recursive
Multi-Agent Systems
🤖
Agent-Based Simulations
recursivemas.github.io
·
1d
·
Hacker News
ZAYA1-8B
: An 8B Moe Model with
760M
Active Params Matching DeepSeek-R1 on Math
🕸️
WASM
firethering.com
·
5d
·
Hacker News
Interfaze
: A new model architecture built for high
accuracy
at scale
💬
Prompt Engineering
interfaze.ai
·
1d
·
Hacker News
Lies,
damned
lies, and
Elastic
's benchmarks
⚙️
Systems Programming
gouthamve.dev
·
2d
·
Hacker News
ProgramBench
: Can Language Models
Rebuild
Programs From Scratch?
🔧
Code Generation
arxiv.org
·
6d
·
Hacker News
MySQL
hypergraph
optimizer
📈
Query Optimization
blog.sesse.net
·
2d
Foundation
Model Engineering: From theory to production
💬
Prompt Engineering
sungeuns.github.io
·
4d
·
Hacker News
BintzGavin/apastra
: Lightweight prompt versioning, evals, benchmarks, and delivery
💬
Prompt Engineering
github.com
·
4d
·
Hacker News
We ran
OWASP
attacks on 8 LLMs. Optimized small models beat frontier
defaults
🔐
Cybersecurity
megacode.ai
·
5d
·
Hacker News
SubQ
: A New LLM with a
12M
Token Context That Rivals Claude and ChatGPT
💬
Prompt Engineering
felloai.com
·
6d
·
Hacker News
Benchmarking
AI agent
retrieval
strategies on Kubernetes bug fixes
📊
Performance Monitoring
cncf.io
·
4d
·
Hacker News
We Ran 250 AI Agent
Evals
to Find Out if Skills Beat Docs. The Answer Is More
Complicated
Than We Expected
🤖
Creative Automation
wix.engineering
·
6d
·
Hacker News
Show HN:
Vibe
code your agents without
vibe
coding your agent
💬
Prompt Engineering
deepeval.com
·
3d
·
Hacker News
Optimize
for change not
application
performance
⚙️
Performance Profiling
echooff.dev
·
3d
·
Hacker News
hpke-ng
: Faster, Smaller, Harder
HPKE
for Rust
⚙️
Systems Programming
symbolic.software
·
4d
·
Lobsters
,
r/rust
e-Bike
Fleet
Monitoring
📊
Running Analytics
tech.marksblogg.com
·
6d
·
Hacker News
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help